A modular RNN-based method for continuous Mandarin speech recognition
نویسندگان
چکیده
A new modular recurrent neural network (MRNN)-based method for continuous Mandarin speech recognition (CMSR) is proposed. The MRNN recognizer is composed of four main modules. The first is a sub-MRNN module whose function is to generate discriminant functions for all 412 base-syllables. It accomplishes the task by using four recurrent neural network (RNN) submodules. The second is an RNN module which is designed to detect syllable boundaries for providing timing cues in order to help solve the time-alignment problem. The third is also an RNN module whose function is to generate discriminant functions for 143 intersyllable diphone-like units to compensate the intersyllable coarticulation effect. The fourth is a dynamic programming (DP)-based recognition search module. Its function is to integrate the other three modules and solve the time-alignment problem for generating the recognized base-syllable sequence. A new multilevel pruning scheme designed to speed up the recognition process is also proposed. The whole MRNN can be trained by a sophisticated three-stage minimum classification error/generalized probabilistic descent (MCE/GPD) algorithm. Experimental results showed that the proposed method performed better than the maximum likelihood (ML)-trained hidden Markov model (HMM) method and is comparable to the MCE/GPD-trained HMM method. The multilevel pruning scheme was also found to be very efficient.
منابع مشابه
An RNN-based preclassification method for fast continuous Mandarin speech recognition
A novel recurrent neural network-based (RNN-based) frontend preclassification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states o...
متن کاملAn RNN-Based Pre-classi cation Method for Fast Continuous Mandarin Speech Recognition
A novel RNN-based front-end pre-classiication scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, nal, and silence. A nite state machine (FSM) is then used to classify the input frame into four states including three stable states of Initial (I), Final (F), and Silenc...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملAn MRNN-based method for continuous Mandarin speech recognition
A new MRNN-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discriminations of the two sub-syllable groups of 100 RFD initials and 39 CI finals, two RNNs for the generations of dynamic weighting functions for sub-syllable’s int...
متن کاملProsodic modeling of Mandarin speech and its application to lexical decoding
In this paper, a new RNN-based prosodic modeling method for Mandarin speech recognition is proposed. It is performed in the post-processing stage of the acoustic decoding aiming at detecting word boundaries for assisting in the lexical decoding. It employs a simple RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries provided...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 9 شماره
صفحات -
تاریخ انتشار 2001